Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Peter Frouman Date: Apr 26, 2013 5:30pm
Forum: web Subject: Re: Concerned that my blog has been added to your archive

Albertina,
You have not actually followed the instructions at http://archive.org/about/exclude.php
Your robots.txt file at http://greenford365.wordpress.com/robots.txt does not actually direct the IA crawler to not archive your entire site but only directs all crawlers to not crawl or index certain parts of your site. However, it seems that wordpress.com users cannot actually edit their own robots.txt files to exclude certain crawlers. You could change your wordpress.com settings to exclude all search engines but that's probably not what you want (it seems like you only want to exclude the Internet Archive crawler). The inability of regular users to edit a site specific robots.txt file appears to be a limitation of the wordpress.com hosting service. Unless you move your site somewhere else, you'll probably need to find another way to request removal. Since you are apparently unable to modify your robots.txt file, you should email info@archive.org as stated at http://archive.org/about/exclude.php

Posting here is very unlikely to result in what you want (removal of your site from the Internet Archive).

By the way, I have no connection at all to the Internet Archive project. I am just trying to be helpful by pointing you to the documentation.

What the Internet Archive does is very clearly absolutely legal under the "fair use" provisions of U.S. copyright law [http://www.copyright.gov/title17/92chap1.html#107] and similar laws in other countries. This has been tested in U.S. courts. See Field v. Google, 412 F. Supp. 2d 1106 (D. Nev. Jan. 19, 2006) [http://fairuse.stanford.edu/primary_materials/cases/fieldgoogle.pdf]. The Internet Archive v. Shell case [http://blog.ericgoldman.org/archives/waybackshell.pdf] someone else cited is not indicative of anything other than the Internet Archive deciding not to waste any more resources dealing with a vexatious pro se litigant making feckless and absurd claims. All they ended up doing is what they would have done anyway (removed the content from the archive) had she made a single polite request rather than making outrageous and feckless demands. Shell did not get a cent from the Internet Archive (she initially demanded $100,000) and nearly all her meritless claims were dismissed on summary judgment.

Reply to this post
Reply [edit]

Poster: jory2 Date: May 1, 2013 5:33am
Forum: web Subject: Re: Concerned that my blog has been added to your archive

@ Peter Frouman on April 26 you referred to a private company's internal policy: "See http://archive.org/about/exclude.php for instructions on how to exclude your site from the archive."
same day you continued with the following uselessness
"Albertina,
You have not actually followed the instructions at http://archive.org/about/exclude.php
Your robots.txt file at http://greenford365.wordpress.com/robots.txt does not actually direct the IA crawler to not archive your entire site but only directs all crawlers to not crawl or index certain parts of your site. However, it seems that wordpress.com users cannot actually edit their own robots.txt files to exclude certain crawlers. You could change your wordpress.com settings to exclude all search engines but that's probably not what you want (it seems like you only want to exclude the Internet Archive crawler)."

So for the record - the Robots.txt file is VOLUNTARY for both the site owner and those who scrap data from the net, further to this it's not a legal requirement and rightfully so, it is after all VOLUNTARY

Hey Peter, how about we take a look at a group of folks that scrap the net for the IA "collections".

http://blog.archive.org/2013/01/09/updated-wayback/
"The prolific volunteers of Archive Team spent a lot of time this year archiving web sites on the verge of disappearing and then contributing those records to Internet Archive."

Archiveteam says: ROBOTS.TXT IS A SUICIDE NOTE

"ROBOTS.TXT is a stupid, silly idea in the modern era. Archive Team entirely ignores it and with precisely one exception, everyone else should too.

If you do not know what ROBOTS.TXT is and you run a site... excellent. If you do know what it is and you have one, delete it. Regardless, Archive Team will ignore it and we'll delete your complaints, just like you should be deleting ROBOTS.TXT."
http://www.archiveteam.org/index.php?title=Robots.txt

You say:
"Posting here is very unlikely to result in what you want (removal of your site from the Internet Archive)."

I agree 100%, she should have filed a DMCA Take Down Notice, to this site, any hosting company and any payment processor(s).

You say:
"By the way, I have no connection at all to the Internet Archive project. I am just trying to be helpful by pointing you to the documentation."

Helpful? Firstly you would need to have your facts straight, and at the very least, know what you're talking about in order to be helpful, no offense.

Reply to this post
Reply [edit]

Poster: Peter Frouman Date: May 1, 2013 7:36am
Forum: web Subject: Re: Concerned that my blog has been added to your archive

jory2,
The Robot Exclusion Standard is indeed voluntary/advisory and no one is obliged to use it and there is no legal requirement for crawlers to honor it. However, the Internet Archive crawler software (Heritrix) does indeed honor it by default. If you doubt the official statements of IA on this, you can easily verify it by running the software yourself or examining its source code. Unless you're just trolling (which seems quite likely), I'm baffled as to why you seem to think that the official IA policy statements about the behaviour of their crawler are unreliable but the statements of random individual users editing a wiki for a completely separate project (that has apparently contributed volunteer time to IA projects) is somehow more reliable. Using the Robot Exclusion Standard is currently the most efficient and reliable method for publishers to communicate their preferences regarding the indexing and archiving of content they publish. Posting random complaints on a forum is much less likely to have the desired result.

DMCA notices can indeed be a useful tool to request removal of actual infringements. However, submitting DMCA notices with false statements to request the removal of non-infringing content is very risky as it can result in both civil liability and criminal penalties. I would suggest that anyone considering filing a DMCA notice get legal counsel from a licensed attorney with expertise in U.S. intellectual property law. Getting help from some random anonymous and completely unqualified person to draft a DMCA notice could turn out to be a very expensive mistake.

It should also be noted that recipients of DMCA notices have no legal obligation to respond to them or take any action (including removal) in response to them. They merely have the option to do so to possibly avoid potential liability but if there is no infringement, there is no liability (except possibly for legal expenses that can't be collected from the losing party).


Reply to this post
Reply [edit]

Poster: jory2 Date: May 1, 2013 10:40am
Forum: web Subject: Re: Concerned that my blog has been added to your archive

Peter Frouman'

"However, the Internet Archive crawler software (Heritrix) does indeed honor it by default. If you doubt the official statements of IA on this, you can easily verify it by running the software yourself or examining its source code."

Actually this link:

https://webarchive.jira.com/wiki/display/wayback/Home

is much more useful.

"Unless you're just trolling (which seems quite likely)"

Ah! I wondered how long it would take before you pulled that lame ass shit! You guys need new material!

So for the record, Trolling I don't know, Copyright / Intellectual Property I know well.

"I'm baffled as to why you seem to think that the official IA policy statements about the behaviour of their crawler are unreliable"

Check your facts, the IA tells you themselves that their software is not 100% reliable. Do you need direction to the exact page where you (and everyone else) can verify this info?
It's within the IA blog.

"but the statements of random individual users editing a wiki for a completely separate project (that has apparently contributed volunteer time to IA projects) is somehow more reliable."

And again you fail and fail miserably with that attempt too! Here's a direct quote from the IA blog:

"The prolific volunteers of Archive Team spent a lot of time this year archiving web sites on the verge of disappearing and then contributing those records to Internet Archive."

http://blog.archive.org/2013/01/09/updated-wayback/

"Using the Robot Exclusion Standard is currently the most efficient and reliable method for publishers to communicate their preferences regarding the indexing and archiving of content they publish. Posting random complaints on a forum is much less likely to have the desired result."

Like many, I would argue the law is the "most efficient and reliable method for publishers to communicate their preferences regarding the indexing and archiving of content they publish.".

"DMCA notices can indeed be a useful tool to request removal of actual infringements. However, submitting DMCA notices with false statements to request the removal of non-infringing content is very risky"

Fail and fails again; no where did I mention "submitting DMCA notices with false statements" because it can indeed result in civil liability.

"I would suggest that anyone considering filing a DMCA notice get legal counsel from a licensed attorney with expertise in U.S. intellectual property law. Getting help from some random anonymous and completely unqualified person to draft a DMCA notice could turn out to be a very expensive mistake."

I bet you would! Who care? Who are you? And do you really think people are not going to read for themselves?, and just take your word for it?

http://www.chillingeffects.org/, wonderful step-by-step instructions! Helpful with counter-notices as well!
Created by the EFF and Google if I'm not mistaken?

"It should also be noted that recipients of DMCA notices have no legal obligation to respond to them or take any action (including removal) in response to them. They merely have the option to do so to possibly avoid potential liability but if there is no infringement, there is no liability (except possibly for legal expenses that can't be collected from the losing party)."

Now I know without doubt that you have no idea what the f@$K you're taking about!, which is the only reason I added to this messy thread in the first place; you're 100% wrong on so many key points it's unbelievable!

People will, and can read for themselves, all the information is freely available for anyone interested.

Peter Frouman you are either clueless, or completely confused by the information you're reading; which is entirely possible, its one thing to read Bills/Acts/Laws ... but if you're not comprehending.

In any case, it's been interesting.

Reply to this post
Reply [edit]

Poster: jory2 Date: Apr 30, 2013 8:56am
Forum: web Subject: Re: Concerned that my blog has been added to your archive

@Peter Frouman:

"What the Internet Archive does is very clearly absolutely legal under the "fair use" provisions of U.S. copyright law"

First off "fair-use" is a legal defense and that can only be determined by the Courts not by private company or the on line personal diaries you got confused by.

Under the REAL U.S Copyright Act:
§ 108. Limitations on exclusive rights: Reproduction by libraries and archives

(a) Except as otherwise provided in this title and notwithstanding the provisions of section 106, it is not an infringement of copyright for a library or archives, or any of its employees acting within the scope of their employment, to reproduce no more than one copy or phonorecord of a work, except as provided in subsections (b) and (c), or to distribute such copy or phonorecord, under the conditions specified by this section,
if —
(1) the reproduction or distribution is made without any purpose of direct or indirect commercial advantage;
(2) the collections of the library or archives are (i) open to the public, or (ii) available not only to researchers affiliated with the library or archives or with the institution of which it is a part, but also to other persons doing research in a specialized field; and
(3) the reproduction or distribution of the work includes a notice of copyright that appears on the copy or phonorecord that is reproduced under the provisions of this section, or includes a legend stating that the work may be protected by copyright if no such notice can be found on the copy or phonorecord that is reproduced under the provisions of this section.
(b) The rights of reproduction and distribution under this section apply to three copies or phonorecords of an unpublished work duplicated solely for purposes of preservation and security or for deposit for research use in another library or archives of the type described by clause (2) of subsection (a),
if —
(1) the copy or phonorecord reproduced is currently in the collections of the library or archives; and
(2) any such copy or phonorecord that is reproduced in digital format is not otherwise distributed in that format and is not made available to the public in that format outside the premises of the library or archives.
(c) The right of reproduction under this section applies to three copies or phonorecords of a published work duplicated solely for the purpose of replacement of a copy or phonorecord that is damaged, deteriorating, lost, or stolen, or if the existing format in which the work is stored has become obsolete,
if —
(1) the library or archives has, after a reasonable effort, determined that an unused replacement cannot be obtained at a fair price; and
(2) any such copy or phonorecord that is reproduced in digital format is not made available to the public in that format outside the premises of the library or archives in lawful possession of such copy.

http://www.copyright.gov/title17/92chap1.html#107

You spot the obvious "errors" on behalf of the IA Peter Frouman?, or, do you (like the IA) expect everyone to pretend not to comprehend what they're reading?

Reply to this post
Reply [edit]

Poster: Peter Frouman Date: Apr 30, 2013 9:13pm
Forum: web Subject: Re: Concerned that my blog has been added to your archive

jory2,
It's quite strange that you would refer to the U.S. District Court opinions and orders in the cases of Parker v. Google [1] (affirmed on appeal) and Field v. Google [2] as "on line personal diaries" and it's quite clear that you are the one who is confused. It's also quite baffling that in a discussion about the fair use provisions of Section 107 of USC Title 17, you would quote the completely irrelevant (to a determination of fair use) Section 108. Your arguments are almost incomprehensible because they don't make any sense and are clearly based on complete ignorance of the fair use provisions of U.S. copyright law and the numerous legal opinions and decisions covering determinations of fair use.

Fair use is indeed determined on a case by case basis by considering the four factors stated in Section 107. The cases of Parker v. Google and Field v. Google involved the Google cache archive which does about the same thing that the Internet Archive Wayback Machine and crawler does. In those cases, Google's cache was found to be a fair use. Thus, what the Internet Archive does is very clearly absolutely legal under the "fair use" provisions of U.S. copyright law. There is no uncertainty about this - the legal cases involving the Google cache have already provided the correct answer.

References:

1. http://www.paed.uscourts.gov/documents/opinions/06d0306p.pdf

2. http://fairuse.stanford.edu/primary_materials/cases/fieldgoogle.pdf

Reply to this post
Reply [edit]

Poster: jory2 Date: May 1, 2013 3:15am
Forum: web Subject: Re: Concerned that my blog has been added to your archive

Peter Frouman,
"It's quite strange that you would refer to the U.S. District Court opinions and orders in the cases of Parker v. Google [1] (affirmed on appeal) and Field v. Google [2]as on line personal diaries"

I didn't, you did, nice try btw!, but it doesn't apply here at all; I'll get to that later.

"It's also quite baffling that in a discussion about the fair use provisions of Section 107 of USC Title 17, you would quote the completely irrelevant (to a determination of fair use) Section 108."

Here's the WHOLE of section 107:
§ 107 . Limitations on exclusive rights: Fair use
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

next is section 108 (the info I posted)

§ 108 . Limitations on exclusive rights: Reproduction by libraries and archives, the section YOU considered to be
"completely irrelevant"? Really???

Now back to your links and "References". First and most important to note is this website is NOT a search engine, will NOT be legally argued as one, thus making your references completely irrelevant.

You say:
"In those cases, Google's cache was found to be a fair use. Thus, what the Internet Archive does is very clearly absolutely legal under the "fair use" provisions of U.S. copyright law. There is no uncertainty about this - the legal cases involving the Google cache have already provided the correct answer."

Google is T E M P O R A R Y, WHEREAS the IA actually makes permanent EXACT replacement copies of the original and serves them up to the public without express permission or consent.

Can you spot the difference yet????? Or ...

Reply to this post
Reply [edit]

Poster: JaneSmith01 Date: May 1, 2013 2:02am
Forum: web Subject: Re: Concerned that my blog has been added to your archive

I'm not entirely sure. Google's cache is both temporary and incomplete, whereas the Archive endeavours to capture a complete and functional version of the pages for display in perpetuity, outside of the owner's control. It seems that under US law, Fair Use is judged in large part based upon whether or not the source is reproduced in its entirety. I don't know of any proper precedents that accurately test this provision.

I am not taking a side here, but I am interested in the ramifications of this particular incident re: privacy of user data and the ability to have it excluded without making use of the automated robot exclusion protocol in cases where, for instance, the user does not have total control over the host site's inner workings.

Terms of Use (10 Mar 2001)